value distribution
ALocalTemporalDifferenceCodeforDistributional ReinforcementLearning
However, since this decoder effectively approximates thenth derivative of the input vector, it is very sensitive to noise. In our framework, the input is often very noisy, since it corresponds to the converging points of different learning traces. In this section we describe two linear decoders that differ from that in [35] and are more noise-resilient. A.9 and A.10 is crucial for long temporal horizons, since regularization causes the overall magnitude of the recoveredฯ-space to decrease asฯ increases3. Normalization amends thedecreasing magnitude problem bymaking theฯ-space to sum to 1 for everyฯ.